Skip to content

Replace PyICU with Rust icu_segmenter crate #18553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 3, 2025

Conversation

anoadragon453
Copy link
Member

@anoadragon453 anoadragon453 commented Jun 13, 2025

Closes #18282.

Replaces the PyICU python wrapper around the C++ icu library with a native Rust solution; the icu_segmenter crate, which is part of the broader icu crate.

This eliminates the need for downstream users to install the libicu package if they want to have improved user search.

The model decided to use the default constructor for WordSegmenter, and I agree. Our "data" is the user's search query, which is always parsed on the fly and not stored. So it's OK if it changes between ICU releases.

Note: This PR was mostly written by OpenAI Codex, before being reviewed by myself. Commits written by Codex are marked as those made by @anoadragon453-codex. Please use extra caution when reviewing them.

This PR updates the minimum supported rust version (MSRV) to 1.82.0.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@anoadragon453 anoadragon453 marked this pull request as ready for review June 13, 2025 17:05
@anoadragon453 anoadragon453 requested a review from a team as a code owner June 13, 2025 17:05
@github-actions github-actions bot deployed to PR Documentation Preview June 17, 2025 12:42 Active
@github-actions github-actions bot deployed to PR Documentation Preview July 2, 2025 14:05 Active
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot deployed to PR Documentation Preview July 2, 2025 15:13 Active
@github-actions github-actions bot deployed to PR Documentation Preview July 2, 2025 15:18 Active
@anoadragon453
Copy link
Member Author

Looks like CI now needs #18596 to pass:

        cargo rustc --lib --message-format=json-render-diagnostics --manifest-path rust/Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --
        error: rustc 1.81.0 is not supported by the following packages:
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
          [email protected] requires rustc 1.82
        Either upgrade rustc or select compatible dependency versions with
        `cargo update <name>@<current-ver> --precise <compatible-ver>`
        where `<compatible-ver>` is the latest version supporting rustc 1.81.0

And this PR effectively bumps our MSRV to 1.82.0.

@anoadragon453 anoadragon453 requested a review from sandhose July 2, 2025 18:08
@github-actions github-actions bot deployed to PR Documentation Preview July 3, 2025 08:11 Active
Co-authored-by: Quentin Gliech <[email protected]>
@github-actions github-actions bot deployed to PR Documentation Preview July 3, 2025 09:36 Active
@anoadragon453 anoadragon453 merged commit be4c95b into develop Jul 3, 2025
44 of 46 checks passed
@anoadragon453 anoadragon453 deleted the anoa/codex/replace-pyicu-with-icu-rust-crate branch July 3, 2025 10:12
@anoadragon453 anoadragon453 mentioned this pull request Jul 3, 2025
3 tasks
gentoo-bot pushed a commit to gentoo/gentoo that referenced this pull request Jul 15, 2025
PyICU was replaced with with Rust icu_segmenter crate.

See-also: element-hq/synapse#18553
Signed-off-by: Petr Vaněk <[email protected]>
OlegGirko added a commit to OlegGirko/synapse that referenced this pull request Jul 21, 2025
…)"

This reverts commit be4c95b.

The icu_segmenter is not pckaged for Fedora and has many dependencies
that either are not packaged, or have older versions than needed.

So, it's easier to revert this change than package all of these crates.
anoadragon453 added a commit that referenced this pull request Jul 31, 2025
Due to `icu_segmenter` requiring 1.82.0. Missed in #18553
OlegGirko added a commit to OlegGirko/synapse that referenced this pull request Aug 2, 2025
…)"

This reverts commit be4c95b.

The icu_segmenter is not pckaged for Fedora and has many dependencies
that either are not packaged, or have older versions than needed.

So, it's easier to revert this change than package all of these crates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider switching away from PyICU as a dependency
3 participants